Deriving Benefit from a Generalized Syntax-based Reordering

نویسندگان

  • Maxim Khalilov
  • José A.R. Fonollosa
  • Mark Dras
چکیده

In this study we describe a syntax-based word reordering technique for n-gram-based statistical machine translation (SMT). The proposed distortion model operates with generalized unlexicalized rules and aims to order source language words so that translation is close to monotonic, simplifying the translation process. In the final step, we apply a translation units blending strategy, combining bilingual tuples extracted from the parallel corpora with monotone and reordered source parts. Experiments are reported on the BTEC corpus from tourist domain for the Arabic-English translation task, the proposed tuples blending technique significantly outperformes the monotone system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Data Mining Approach to Learn Reorder Rules for SMT

In this paper, we describe a syntax based source side reordering method for phrasebased statistical machine translation (SMT) systems. The source side training corpus is first parsed, then reordering rules are automatically learnt from source-side phrases and word alignments. Later the source side training and test corpus are reordered and given to the SMT system. Reordering is a common problem...

متن کامل

A Data Mining Approach to Learn Reorder Rules for SMT

In this paper, we describe a syntax based source side reordering method for phrasebased statistical machine translation (SMT) systems. The source side training corpus is first parsed, then reordering rules are automatically learnt from source-side phrases and word alignments. Later the source side training and test corpus are reordered and given to the SMT system. Reordering is a common problem...

متن کامل

Phrase Reordering Model Integrating Syntactic Knowledge for SMT

Reordering model is important for the statistical machine translation (SMT). Current phrase-based SMT technologies are good at capturing local reordering but not global reordering. This paper introduces syntactic knowledge to improve global reordering capability of SMT system. Syntactic knowledge such as boundary words, POS information and dependencies is used to guide phrase reordering. Not on...

متن کامل

A Direct Syntax-Driven Reordering Model for Phrase-Based Machine Translation

This paper presents a direct word reordering model with novel syntax-based features for statistical machine translation. Reordering models address the problem of reordering source language into the word order of the target language. IBM Models 3 through 5 have reordering components that use surface word information but very little context information to determine the traversal order of the sour...

متن کامل

Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation

We present a constituent parsing-based reordering technique that improves the performance of the state-of-the-art English-to-Japanese phrase translation system that includes distortion models by 4.76 BLEU points. The phrase translation model with reordering applied at the pre-processing stage outperforms a syntax-based translation system that incorporates a phrase translation model, a hierarchi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008